Quilting Stochastic Kronecker Product Graphs to Generate 
Multiplicative Attribute Graphs 



Hyokun Yun S.V. N. Vishwanathan 

Department of Statistics Department of Statistics and Computer Science 

F^irdue University Purdue University 



Abstract 

We describe the first sub-quadratic sampling al- 
gorithm for the Multiplicative Attribute Graph 
Model (MAGM) of Kim and Leskovec (2010). 
We exploit the close connection between MAGM 
and the Kronecker Product Graph Model 
(KPGM) of Leskovec et al. (2010), and show 
that to sample a graph from a MAGM it suf- 
fices to sample small number of KPGM graphs 
and quilt them together Under a restricted set 
of technical conditions our algorithm runs in 
O ((log2 \E\) time, where n is the number 
of nodes and is the number of edges in the 
sampled graph. We demonstrate the scalability 
of our algorithm via extensive empirical evalua- 
tion; we can sample a MAGM graph with 8 mil- 
lion nodes and 20 billion edges in under 6 hours. 



1 Introduction 

In this paper we are concerned with statistical models on 
graphs. Typically one is interested in two aspects of graph 
models, namely scalable inference and efficiency in gener- 
ating samples. While scalable inference is very important, 
an efficient sampling algorithm is also critical: 

• To assess the goodness of fit, one generates graphs 
from the model and compares graph statistics of the 
samples with the original graph (Hunter et al. 2008). 

• To test whether a certain motif is overrepresented in 
the graph, one strategy is to sample large number of 
graphs in the null hypothesis to approximate the p- 
value of the test statistic (Shen-Orr et al. 2002). 
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• To predict the future growth of the graph, one may fit 
the model on the current graph and generate a larger 
graph with the estimated parameters. 

The stochastic Kronecker Product Graph Model (KPGM) 
was introduced by Leskovec et al. (2010) as a scalable sta- 
tistical model on graphs. Compared to previous mod- 
els such as Exponential Random Graph Models (ERGMs) 
(Robins et al. 2007) or latent factor models (Hoff 2009), 
KPGM sports a number of advantages. In particular, the 
inference algorithm of KPGM is scalable to very large 
graphs, and sampling a graph from the model takes time 
that is proportional to the expected number of edges. This 
makes the KPGM a very attractive model to study. How- 
ever, Moreno and Neville (2009) report that KPGM fails to 
capture some characteristics of real-world graphs, such as 
power-law degree distribution and local clustering. 

In order to address some of the above shortcom- 
ings and to generalize the expressiveness of KPGM, 
Kim and Leskovec (2010) recently proposed the Mul- 
tipUcative Attribute Graph Model (MAGM). MAGM 
can provably model the power-law degree distribution 
while KPGM can not. Furthermore, Kim and Leskovec 
(2010) demonstrate empirically that MAGM is better 
able to capture graph statistics of real-world graphs 
(Kim and Leskovec 2011). The issue of inference for 
MAGM is addressed by Kim and Leskovec (2011) via an 
efficient variational EM algorithm. 

However, the issue of efficiently sampling graphs from 
MAGM remains open. Currently, all algorithms that we 
are aware of scale as 0{n^) in the worst case, where n 
is the number of nodes. This is because, the probability 
of observing an edge between two nodes is determined by 
the so-called n x n edge probability matrix. Unlike the 
case of KPGM where the edge probability matrix has a 
Kronecker structure which can be exploited to efficiently 
sample graphs in expected 0(log2(n)|i?|) time, where \E\ 
is the number of edges, no such results are known for 
MAGM. In the absence of any structure, naively sam- 
pling each entry of the adjacency matrix requires 0{n^) 
Bernoulli trials, which is prohibitively expensive for gener- 
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ating real-world graphs with millions of nodes. 

In this paper we show that under a restricted set of technical 
conditions, with high probability, a significant portion of 
the edge probability matrix of MAGMs is the same as that 
of KPGMs (modulo permutations). We then exploit this 
observation to quilt 0((log2(n))^) graphs sampled from a 
KPGM to form a single sample from a MAGM. The ex- 
pected time complexity of our sampling scheme is thus 
0{{\og,{n)r\E\). 

1.1 Notation and Preliminaries 

A graph G consists of an ordered set of n nodes V = 
{1,2,..., n}, and a set of directed edges E C V xV. A 
node i is said to be a neighbor of another node j if they are 
connected by an edge, that is, if £ E. Furthermore, 
for each edge (i, j), i is called the source node of the edge, 
and j the target node. We define the adjacency matrix of 
graph G as the n x n matrix A with Aij — 1 if {i, j) ^ E, 
and otherwise. Also, the following standard definition of 
Kronecker product is used (Bernstein 2005): 



Definition 1 Given real matrices X G 
the Kronecker product X <^Y E 



and Y £ 
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The k-th Kronecker power is 0f^iX 

2 Kronecker Product Graph Model 

The Stochastic Kronecker Product Graph Model of 
Leskovec et al. (2010) is usually parametrized by a 2 x 2 
initiator matrix 



e := 



700 

9io 



701 

9ii 



(1) 



with each 6ij e [0, 1], and a size parameter d G Z+. For 
simplicity, let the number of nodes n be 2"^. The case where 
n < 2'^ can be taken care by discarding (2'' — n) number 
of nodes later as discussed in Leskovec et al. (2010), but in 
our context we will only use n — 2'^ for KPGM. 

The n X n edge probability matrix P is defined as the d-th 
Kronecker power of the parameter 0, that is. 



P = el^^l =909' 



9. 



(2) 



d times 



The probability of observing an edge between node i and j 
is simply the (z, j)-th entry of P, henceforth denoted as Pij. 
See Figure 1 (left) for an example of the edge probability 
matrix of a KPGM, and observe its fractal structure which 
follows from the definition of P. 



Figure 1: Examples of edge probability matrix (Left: 
KPGM, Right: MAGM). Darker cells imply a higher prob- 
ability of observing an edge. On the left, one can see the 
fractal-like Kronecker structure; dividing the matrix into 
four equal parts yields four sub-matrices which up to a scal- 
ing look exactly the same. 



Note that one can generalize the model in two ways 
(Leskovec et al. 2010). First, one can use larger ini- 
tiator matrices. Second, different initiator matrices 
0(1) 0(2) ^Crf) (.^jj used at each level. In this case, 
the edge probability matrix becomes 



P = 9(i)®9(2)®---®9('^). 



(3) 



In this paper we will work with (3) because it is closer in 
spirit to the MAGM which we will introduce later. For 
notational simplicity, we will denote 



9:= {9(1), 9(2),..., 9(^^)1 



(4) 



2.1 Sampling 



The straightforward way to sample a graph from a KPGM 
is to individually sample each entry Aij of the adjacency 
matrix A independently. This is because, given the parame- 
ter matrix 9, the event of observing an edge between nodes 
i and j is independent of observing an edge between nodes 
i' and j' for 7^ Thus one can view the adja- 

cency matrix Aas an x n array of independent Bernoulli 
random variables, with P {Aij 1 | 9) 
algorithm requires 0{n^) effort. 



Pij. Such an 



However, the Kronecker structure in the edge probability 
matrix P can be exploited to sample a graph in expected 
0(log2(n)|£'|) time. The idea of Algorithm 1 suggested 
by Leskovec et al. (2010) is as follows: 



The algorithm first determines the number of edges in the 
graph. Since the number of edges \E\ — J2i j ^ij '^he 
sum of independent Bernoulli random variables, it ap- 
proximately follows the normal distribution for large n. 
Thus, one can sample the number of edges according to 
this normal distribution. 

Next, the algorithm samples each individual edge accord- 
ing to the following recursive scheme. Let S denote the set 
of candidate nodes for the source and T the candidate target 
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nodes. Initially both S and T are {1,2,..., n}. Using the 
matrix 8, the proportion of expected number of edges in 
each quadrisection (north-west, north-east, south-west and 
south-east) of the adjacency matrix can be computed via 



(a+l)n/2 (6+l)ri/2 
i=an/2+l j=bn/2+l 



q(l) 



< a,6 < 1. 



(5) 



Then one can sample a pair of integers (a, 6), < a,b < 1, 
with the probability of (a, b) proportional to 9ab, to reduce 
S and T to {an/2 + 1, an/2 + 2, . . . , (a + l)n/2} and 
{bn/2 + 1, bn/2 + 2,...,{b + l)n/2}, respectively. Due 
to the Kronecker structure of the edge probability matrix 
P, repeating this quadrisection procedure d times reduces 
both S and T to single nodes S = {i} and T = {j}, which 
are now connected by an edge. There is a small non-zero 
probability that the same edge is sampled multiple times. 
In this case the generated edge is rejected and a new edge 
is sampled (see pseudo-code in Algorithm 1). 

Algorithm 1 Sampling Algorithm of Stochastic Kronecker 
Graphs 



1: 
2: 
3: 
4: 
5: 
6 
7: 
8: 
9 
10 
11 
12: 
13 
14; 
15 
16; 
17; 
18; 
19; 
20; 



procedure KPGMSample(9) 



Generate X ^ Af{m, m 
for x = 1 to X do 

Sstart^Tstart ^ 1 



v) 



-'end-i en< 



Tend ^ n 



for fc 1 to d do 

Sample (a, b) oc o'"^ 

^ start Sstart + (^)- 
Tstart Tstart + & (^)- 
Send Send — (1 — a) (^). 
Tend ^ Tend ^ ^ b) (t 

end for 

# We have Sstart — Send, Tstart 
E ^ EU {{Sstart, Tstart)} 

end for 
return E 
end procedure 



^ 2*= , 



Te, 



3 Multiplicative Attribute Graph Model 

An alternate way to view KPGM is as follows: Associate 
the i-th node with a bit- vector b{i) of length log2(n) such 
that bk{i) is the fc-th digit of integer {i — 1) in its binary 
representation. Then one can verify that the (z, j)-th entry 
of the edge probabiUty matrix P in (3) can be written as 



fe=i 



9(fc) 

bk(i) bk(j)' 



(6) 



Under this interpretation, one may consider bk{i) — 1 
(resp. bk{i) — 0) as denoting the presence (resp. absence) 
of the fc-th attribute in node i. The factor 6'['^'. , , , ., de- 
notes the probability of an edge between nodes i and j 
based on the value of their fc-th attribute. The attributes 
are assumed independent, and therefore the overall prob- 
ability of an edge between i and j is just the product of 
0^'=' 's 

The Multiplicative Attribute Graph Model (MAGM) of 
Kim and Leskovec (2010) is also obtained by associating 
a bit-vector f{i) with a node i. However, f{i) need not 
be the binary representation of (z — 1) as was the case 
in the KPGM. In fact, f{i) need not even be of length 
log2(n). We simply assume that fk{i) is a Bernoulli ran- 
dom variable with P {fk{i) ~ 1) = /i In addition to 
O defined in (4), the model now has additional parameters 
// := {/i'l', . . . j/.t'^'')}, and the (i,i)-th entry of the 
edge probability matrix Q is written as 



Q^, - n 



fk(i) fk{j)' 



(7) 



k=l 



4 Quilting Algorithm 

A close examination of (6) and (7) reveals that KPGM and 
MAGM are very related. The only difference is that in the 
case of the KPGM the i-th node is mapped to the bit vector 
corresponding to (z — 1) while in the case of MAGM it is 
mapped to an integer (not necessarily {i — 1)) whose bit 
vector representation is f{i). We will call the attribute 
configuration of node i in the sequel. 

For ease of theoretical analysis and in order to convey our 
main ideas we will initially assume that d = log2(n), that 
is, we assume that f{i) is of length log2(ri) or equivalently 
Q < Xi < n (this assumption will be relaxed in Section 
4.2). Under the above assumption, every entry of Q has a 
corresponding counterpart in P because 



Q^ 



P. 



\i Xj ■ 



(8) 



The key difficulty in sampling graphs from MAGM arises 
because the attribute configuration associated with differ- 
ent nodes need not be unique. To sidestep this issue we 
partition the nodes into B sets such that no two nodes in a 
set share the same attribute configuration. 

While a number of partitioning schemes can be used, 
we find the following scheme to be easy to analyze and 
efficient in practice: For each i define the set Zt := 
{j s.t. j <i and A,; ~ Aj}. Clearly \Zi\ counts nodes j 
whose index is smaller than or equal to i and which share 
the same attribute configuration A^. We now define the c-th 
set Dc in our partition as De {i s.t. \Zi\ — c}. No two 
nodes in De share the same attribute configuration. Fur- 
thermore, one can show that the number of sets B is mini- 
mized by this scheme. 
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Figure 2: Partition the MAGM edge probability matrix Q 
into pieces such that no two nodes in a piece share the 
same attribute configuration. 



Theorem 2 (Size Optimality of the Partition) Let the 

partition o/ {1, . . . , n} obtained by the above scheme be 
denoted as Di, D2, ■ ■ ■ , Db- Then, B, the number of 
nonempty sets in the partition, is minimized. 



Proof See Appendix A. 



Using the partition _Di, . . . , _Db, we can partition the edge 
probabiUty matrix Q into B^ sub-matrices (Figure 2): 



)(fc,0 



otherwise. 



permute 



Q 



'(1,1) 



Figure 3: Each piece of the edge probabiHty matrix is per- 
muted to become a sub-matrix of the KPGM edge proba- 
biHty matrix. One can then apply Algorithm 1 to sample 
graphs from this permuted edge probability matrix and re- 
tain the sub-graph of interest. 

Next, by applying a permutation which maps Xi to i we can 
transform each of the B^ sub-matrices of Q into a subma- 
trix of P as illustrated in Figure 3. Formally, define 



Q 



i,0 




if x e Dk,y e Di,i 
otherwise. 



Algorithm 1 can be used to sample graphs from this per- 
muted edge probability matrix with parameters 6. We fil- 



ter the sampled graph to only retain the sub-graph of inter- 
est. Finally, the sampled sub-graphs are un-permuted and 
quilted together to form a sample from the MAGM (see 
Figure 4). Let A"^*^^') denote the adjacency matrix of the 
graph sampled from the edge probability matrix Q''*^''^ via 
Algorithm 1. Define 



A 



(k,i) 



A 



if i e Dk,j e Di,x 
otherwise. 



A, 



y ■ 



A,,- 



The quilted adjacency matrix A is given by 'J2k i 
See Algorithm 2. 



(9) 



k,l) 



^(1,1) 



Ad 



A 



^(2,1) 



^(2,2) 



Figure 4: The sub-graphs sampled from the previous step 
are un-permuted and quilted together to form a graph sam- 
pled from the MAGM. 

Algorithm 2 Sampling Algorithm of Multiplicative At- 
tribute Graphs 

1 

2 
3 
4 
5 
6 
7 



9 
10 
11 
12 
13 
14 



function MAGSampleEdges( 9, /(I), . . . , f{n) ) 
B ^ maxi \Zi\ 
for fc ^ 1 to i? do 
for ; ^ 1 to S do 

£;(fe,0 ^ KPGMSAMPLE(e) 
for each {x, y) € ijC^^') do 

if such that i G Dk, j G D;, and 
X ~ Xi, y = Xj exists then 

end if 
end for 
end for 
end for 
return E 
end function 



Theorem 3 (Correctness) Algorithm 2 samples the en- 
tries of the adjacency matrix A independently with 

1 ^ij = 1 I ©I Ai, . . . , A„ j = Qij. 



Proof See Appendix A 



Hyokun Yun, S.V. N. Vishwanathan 



4.1 Time Complexity 



Since the expected running time of Algorithm 1 is 
0(log2(n)|£'|), the expected time complexity of quilting 
is clearly 0{B^ \og2{n)\E\). The key technical challenge 
in order to establish the efficiency of our scheme is to show 
that with high probability B is small, ideally 0(log2 [n))- 

Balanced Attributes Suppose the distribution of each at- 
tribute is balanced, that is, ^^^'> = /x'^' — ■ . ■ — ^('') = 0.5 
and n — 2"^. Define a random variable X\. = 1 if A; c 
and zero otherwise. Since ^l^^^ = 0.5, it follows that 
^{Xl^\) ^ ^ ^ \. If we let = Y^L^ X^, then 
clearly B = maxc Yc- Since are independent Bernoulli 
random variables, Yc is a binomial random variable which 
converges to a Poisson random variable with parameter 1 as 
n — > oo. Using standard Chernoff bounds for the Poisson 
distribution (see Theorem 5), we can write 



F{Yc>t) < —, and hence 

n 

F{B^ maxYc > t) F[Yc > t] < 

Replacing t by log2(n), 

P(B>log2(n)) < 



c=l 



ne 



3(log2(n))i°g2(«) ■ 



(10) 
(11) 

(12) 



As n ^> oo, (12) goes to (also see Figure 5). Therefore 
we have 

Theorem 4 When ^(i) = ^(2) = ... = ^(d) = o.5, and 
n = 2'^, with high probability the size of partitions B is 
smaller than log2(n). 



Unbalanced Attributes As before, we 

let/i(i) =^(2) ^ 

• • ■ = ^(''^ = ^ and n = 2'^, but now we analyze the 
case when /i ^ 0.5. By transposing some of Q^'^^ if nec- 
essary, it suffices to restrict our attention to /i £ (0.5, 1]. 
We define the random variables XI and Y^. as in the previ- 
ous section. However, now P (^c) depends on the number 
of I's in the binary representation of c. In particular, if 



2"^ = n then P {X^) = ^i°g2(") and y„ 



log2(n) 



Furthermore, P (X^) < ^i°g2(«) for every c ^ n. There- 
fore, when jjL is close to 1 and n is large, B := maxc Yc 
equals y,j = nfj}°^^^'^^ with high probability (see Figure 6). 
The expected running time of our algorithm then becomes 

(„iog2(A')+iiog2(n)|i;|). 

4.2 Handling the Case When n ^ 2'^ 

To simplify the analysis assume /i*-^' — ■ ■ ■ ^ fj,^''') = 0.5. 

n > 2'^ Case First, consider the case n > 2^, and 
let d' := [log2(n)], d" :— [log2(n)J. Each Yc now 



c 
o 



S3 
ft 

o 



25 



20 



15 



10 



— Observed 
■- log2(n) 



2 4 6 

number of nodes n 



■IQS 



Figure 5: Number of nodes vs. size of the partition, when 
^(fc)'s are all set to be 0.5. For each n, we performed 10 
trials and report average values (blue solid line). The red 
dashed line is the bound predicted by (12). Observe that in 
practice, the size of the partition grows much slower than 
0(log2(n)). 



ti = 0.55 



ti = 0.60 










Observed 

- - - 71,('' 







number of nodes 



Figure 6: Number of nodes vs. size of the partition, when 
/iC^J's are all set to be 0.55, 0.60, 0.70, and 0.90. Again, 10 
number of F matrices were sampled for each n and /i, and 
the average size of the partition is taken. For small values 
of n, nji'^ approximation is not tight but the observed value 
is sandwiched between log2(n) and n^jf". For (x > 0.70, 
the nji'^ approximation is tight. 
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converges to the Poisson distribution with parameter 
and using the Chernoff bound again, one can prove that 
B — 0(2'' log2(n)) with high probability, for large n. 
For details, refer to Appendix C. 

Therefore, the expected time complexity is bounded by 
0{{\og2{n))'^d\E\), and the algorithm gets faster as d de- 
creases. 

n < 2'^ Case For the sake of completeness we will 
make some remarks for the case when n < 2'*. However, 
Kim and Leskovec (2010) and Kim and Leskovec (2011) 
report that in practice d « log2(ri) usually results in the 
most realistic graphs. In general, for large d the value of 
B is small, but the number of edges in graphs sampled by 
Algorithm 1 ,which is called by line 5 of Algorithm 2, in- 
creases exponentially with d. Therefore, the overall com- 
plexity of the algorithm is at least il(4'*"'*" E[|i?|]), and 
thus naively applying Algorithm 2 is not advantageous 
when d — d" is high. Our experiments in Section 6.4 con- 
firm this behavior. 

5 Speeding up the Algorithm 

The key to speeding up our algorithm for the case when 
/i^'^' 7^ 0.5 is the following observation: When ^f*^) ap- 
proaches or 1, the number of distinct attribute configu- 
rations reduces significantly. For instance, when j.i'^''^ ap- 
proaches 1 attribute configurations which contain more I's 
in their binary representation are generated with greater fre- 
quency. Similarly, when z^^*^) approaches the attribute 
configurations which contain more O's in their binary rep- 
resentation are preferred. Figure 7 represents this phe- 
nomenon visually. 

We select a number B' (see below) and collect all nodes 
i whose attribute configuration A; occurs at most B' times 
in the set {Ai, A2, . . . , A„} into a set W . Since each at- 
tribute configuration occurs at most B' times in W, the 
sub-graph corresponding to the nodes in W can be sam- 
pled in O (i?'^ log2('^) \E\) by using Algorithm 2. 

We partition the nodes whose attribute configuration occurs 
more than B' times into sets Db. such that the at- 

tribute configuration of each node in Di is the same, say 
A^. The sub-graph corresponding to each Di is an uniform 
random graph with probability of an edge being equal to 
Py ,\' ■ On the other hand, the sub-graph corresponding to 
nodes Di and Dj for i j is also an uniform random graph 
with the probability of an edge being equal to Py.^xr- Fi- 
nally, the sub-graph corresponding to a node i' in W and 
the set Dj is an uniform random graph with the probability 
of an edge being equal to P\ ,,a' ■ These sub-graphs can 
be sampled with 0((| + + 0{dR+\E\), and 
0{dR^ + \E\) effort respectively'. 

' Instead of sampling k i.i.d. Bernoulli random variables 




10" 10^ 10^ 10=^ 10^ 
attribute configuration (ranked) 



Figure 7: We rank attribute configurations based on their 
frequency of occurrence and plot them for different values 
of ^. We fixed d = 15, and n = 2^^ for this plot. When 
/i = 0.5, the graph is very flat since every attribute config- 
uration has the same probability ^ of being sampled. On 
the other hand, when /i = 0.9, the probability mass is very 
concentrated on a small number of configurations. Note 
that this is a log-log plot. 

It remains to discuss how to choose the parameter B' . To- 
wards this end let 

T{B') ^ B'^ log(n) + {\W\ + d)R + dR^ . 

Then, the overall time complexity of our algorithm is 
0{T{B')). We calculate T{B') for every B' , and choose 
the value which minimizes T{B'). Since there are only n 
distinct values of B' , this procedure requires 0{n) time. 

6 Experiments 

We empirically evaluated the efficiency and scalability of 
our sampling algorithm. Our experiments are designed to 
answer the following questions: Does our algorithm pro- 
duce graphs with similar characteristics as observed by 
Kim and Leskovec (2010). How does our algorithm scale 
as a function of n, the number of nodes in the graph. Fur- 
thermore, since our theoretical analysis assumed /i = 0.5 
and d — log2(n) we were interested in the following ad- 
ditional questions: How does the algorithm behave for 
/i 7^ 0.5. How does our algorithm scale when the num- 
ber of features d is different from log2(n). 

Our code is implemented in C-n- and 
will be made available for download from 
http : / / www . Stat . pur due . edu/ -yunS. All 
experiments are run on a machine with a 2.1 GHz pro- 
cessor running Linux. For the first three experiments we 

X\,Xi, . . . , Xk with parameter p, we use a geometric distribu- 
tion with parameter p to sample to generate random variables Kj 
such that 1 < 7^1 < 7^2 < . . . < fc, and set Xk^ = 1. 
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number of nodes (n) number of nodes {n) 
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number of nodes (n) number of nodes (n) 



Figure 8: The number of edges \E\ as a function of the size 
n of the graphs sampled from the MAGM for two different 
values of 9. The near linear rate of growth on the log- 
log plots confirms the observation that \E\ = n'^ for some 
constant c. 



Figure 9: The fraction of nodes in the largest strong com- 
ponent as a function of the size n of the graphs sampled 
from the MAGM for two different values of 9. Asymptot- 
ically, the fraction of edges approaches 1 implying that the 
entire graph is part of the same strong component. 



uniformly set n — 2"^, where n is the number of nodes 
in the graph and d is the dimension of the features. We 
used the same 9 matrices at all levels, that is, we set 
9 = 9(1) = 9(2) = ... ^ qC'). Furthermore, we 
experimented with the following 9 matrices used by 
Kim and Leskovec (2010) and Moreno and Neville (2009) 
respectively: 



9i 



0.15 
0.7 



0.7 

0.85 



and 92 



0.35 
0.52 



0.52 
0.95 



(13) 



6.1 Properties of the Generated Graphs 



Theorem A guarantees that our algorithm generates valid 
graphs from the MAGM. In our first experiment we also 
verify this claim empirically. Towards this end, we set 
fi — 0.5 and generated graphs of size n = 2"^ for various 
values of d. For each n we repeated the sampling proce- 
dure 10 times and studied various properties of the gener- 
ated graphs. As reported by Kim and Leskovec (2010), the 
number of edges in the graphs generated by MAGM 
grow as = n'^ for some constant c. Graph samples gen- 
erated by our algorithm also confirm to this observation, as 
can be seen in Figure 8. Furthermore, Kim and Leskovec 
(2010) report that the proportion of nodes in the largest 
strong component increases asymptotically to 1. We also 
observe this behavior in the samples generated by our al- 
gorithm (see Figure 9). These experiments indeed confirm 
that our algorithm samples valid graphs from the MAGM. 

6.2 ScalabiUty 

To study the scalabihty of our algorithm we fixed /i = 0.5 
and generated 10 graphs of size n = 2'^ for various values 
of d. Figure 10 compares the running time of our algorithm 
vs a naive scheme which uses independent Bernoulli 
trials based on the entries of the edge probability matrix. 

Note that using the naive sampling scheme we could not 
sample graphs with more than 262,144 nodes in less than 8 
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Figure 10: Comparison of running time (in milliseconds) 
of our algorithm vs the naive sampling scheme as a function 
of the size n of the graphs sampled from the MAGM for 
two different values of 9. 



hours. In contrast, the running time of our algorithm grows 
significantly slower than 0{n^) and consequently we were 
able to comfortably sample graphs with a million nodes in 
less than twenty minutes. The largest graphs produced by 
our algorithm contain over 8 million nodes (8,388,608) and 
20 billion edges. In fact these graphs are, to the best of our 
knowledge, at least 32 times larger than the largest MAGM 
graphs reported in literature in terms of number of nodes. 
Furthermore, we observed that our algorithm exhibits the 
same behavior across a range of 9 values (not reported 
here). To further demonstrate the scalability of our algo- 
rithm we plot the running time of our algorithm normalized 
by the number of edges in the graph in Figure 1 1 . Across 
a range of n values, our algorithm spends a constant time 
for each edge that is generated. We can therefore conclude 
that the running time of our algorithm grows empirically as 
Oi\E\). 

6.3 Effect of II 

Our theoretical analysis (Theorem 4) guarantees the scala- 
bility of our sampling algorithm for /i = 0.5. In this section 
we explore empirically how the running time of our algo- 
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Figure 1 1 : Running time per each edge (in milliseconds) 
for our algorithms vs the naive algorithm as a function 
of n the number of nodes. Note that the normalized run- 
ning time of our algorithm is nearly constant and does not 
change as n increases. 





Figure 13: Estimated pmax as a function of the number of 
nodes for two different values of Q. 
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Figure 12: Relative running time for two different val- 
ues of 8. 



rithm varies as we vary /i. Towards this end we define and 
study the relative running time := where r(/i) 

denotes the running time of the algorithm as a function of 
II. In Figure 12 we plot p{fj,) for different values of n = 2'^. 

As expected, our algorithm performs well for /i = 0.5, in 
which case the size of partition B is bounded by log2(?i) 
with high probability. Similarly, when /i w or 1 the at- 
tribute configurations have significantly less diversity and 
hence sampling becomes easy. However, there is also a ten- 
dency that the running time increases as fj, increases. This 
is because the number of edges is also a function of fj,, and 
due to our choice of 8 it is an increasing function. This 
phenomenon is more conspicuous for 82 than 81, since 
611 of the former is larger than that of the latter. 

One may also be interested in pmax := maxo<^<i p{fi)- In 
order to estimate pmax we let /i £ {0.1, 0.2, . . . , 0.9} and 
plotted the worst value of as a function of n the num- 
ber of nodes in Figure 13. In all cases pmax was attained 
for /i — 0.7 or /i = 0.9. It is empirically seen that the fac- 
tor increases as the number of nodes n increases, but 
the speed of growth is reasonably slow such that still the 
sampling of graphs with millions of nodes is feasible for 
any value of p. 



Figure 14: The effect of dimension d on the running time, 
where other parameters are fixed as /i = 0.5 and n = 2^^. 
d = 15 (when n — 2^) is highlighted by the dashed line. 

6.4 Effect of d 

Now we study how the performance of our model varies 
as d changes. In particular we fix n — 2^^ and vary d 
to investigate its effect on running time of our algorithm. 
Figure 14 shows that there is no significant difference in the 
running time for d < log2(n). However, as we explained 
in Section 4.2, the running time of our algorithm increases 
exponentially when d > log2(n). 

7 Conclusion 

We introduced the first sub-quadratic algorithm for effi- 
ciently sampling graphs from the MAGM. Under technical 
conditions, the expected running time of our algorithm is 

O (^(log2(n))^ \E\y Our algorithm is very scalable and is 

able to produce graphs with approximately 8 million nodes 
in under 6 hours. Even when the technical conditions of 
our analysis are not met, our algorithm scales favorably. 
We are currently working on rigorously proving the perfor- 
mance guarantees for the case when ^ ^ 0.5. 

Efficiently sampling MAGM graphs for the case when 
A > log2 (n) remains open. We are currently investigating 
how high-dimensional similarity search techniques such as 
locality sensitive hashing (LSH) or inverse indexing can be 
applied to this problem. 



Hyokun Yun, S.V. N. Vishwanathan 
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A Technical Proofs 

Proof of Theorem 2 Let i' argmaxj \ Zi\. The at- 
tribute configuration A^' corresponding to node i' appears 
at least B times in the set {Ai . . . . , A„}. By the pigeon-hole 
principle any partition of the set {1, . . . , n} which contains 
less than B sets must have a set which contains two nodes 
i and j with attribute configuration A; ~ Xj — Xi> . There- 
fore, the number of sets in the partition must be at least 
B. Our partitioning scheme produces exactly B sets, and 
is therefore optimal. 



Proof of Theorem By definition 

A. 



l<k,l<B 



(14) 



To prove the theorem, we first show that Aij = 
This is straightforward from definition (9), 



A 



'\z.\,\z,\ 

Ai.A, 



since Di, . . . , Db is a partition of nodes and thus i e Dk, 
j G Di for only k = \Zi\ and I = \Zj\. This also implies 



A, = i\ e,Ai,...,A„) =4S:'^'' 



P: 



Qi,j I 



using (8). 



To prove independence, we show that if ^ 

then (A„A,) ^ (A,,, A,0 or (|Z,|, |Z,|) # m\,\Z'^\). 

Since we already showed Aij — Al,^-^ , this implies 
independence of Aij to other entries in A. 

Now, suppose (Ai,Aj) — {Xi>,Xj'), since if it does not 
hold there is nothing to prove. Because ^ j') by 
assumption, at least one of i ^ i' or j ^ j' is true. Without 
loss of generality, suppose i ^ i'. Then, since A,; = A^', 
\Zi\ \Zii I from definition of Zi. 



B Chernoff Bound of Poisson Distribution 



82 (n) 



Then, plugging these into (15), 



(17) 



(18) 



(19) 



c=l 

(20) 



< n 



g-2' . ^2'+i . (|2*+l)2'+'log2(n) 



(2*+ilog2(n))2*+'i°g2(») 
By taking log. 



(21) 



logP (B = maxy^ > 2*+^ log2(ri)) < -2* + 2*+^ log2(n) 

+ (2*+i log2(n)) log2 2*+i - 2*+i log2(n) log2(2*+i log2(n)) 

= -2* + 2*+i log2(n)(l + log2 2*+i - log2(2*+i logn)), 

(22) 

and this goes to — cxd as n oo. Therefore, F[B = 
maxYc > 2*+i log2(n.)] 0. 



Theorem 5 Let X be the random variable which is dis- 
tributed as Poisson distribution of parameter X. Then, 



P{X >x)< 



e-^(eA)^ 



(15) 



C Upper bound of size of the partition when 

As a binomial distribution with finite mean in limit, Yc is 
approximately distributed as a Poisson distribution of pa- 
rameter To use Chernoff bound (15) with A = 
X — 2*+^ log2(n), t = d" — d, we first bound each term: 



(16) 



